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Covariate adjustment is an important tool in the analysis of randomized clinical trials 
and observational studies. It can be used to increase efficiency and thus power, and 
to reduce possible bias. While most statistical tests in randomized clinical trials 
are nonparametric in nature, approaches for covariate adjustment typically rely on 
specific regression models, such as the linear model for a continuous outcome, the 
logistic regression model for a dichotomous outcome and the Cox model for survival 
time. Several recent efforts have focused on model-free covariate adjustment. This 
paper makes use of the empirical likelihood method and proposes a nonparametric 
approach to covariate adjustment. A major advantage of the new approach is that it 
automatically utilizes covariate information in an optimal way without fitting non- 
parametric regression. The usual asymptotic properties, including the Wilks-type 
result of convergence to a x 2 distribution for the empirical likelihood ratio based test, 
and asymptotic normality for the corresponding maximum empirical likelihood esti- 
mator, are established. It is also shown that the resulting test is asymptotically most 
powerful and that the estimator for the treatment effect achieves the semiparametric 
efficiency bound. The new method is applied to the Global Use of Strategies to Open 
Occluded Coronary Arteries (GUSTO)-I trial. Extensive simulations are conducted, 
validating the theoretical findings. 
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1. INTRODUCTION 

Testing for the statistical significance of treatment differences is a key element in 
the analysis of randomized clinical trials. In its simplest form, patients are randomly 
allocated to either a treatment or control group and their responses are recorded. 
Many statistical methods are available for testing whether there is convincing evi- 
dence that a treatment difference exists between the two groups; cf. Pocock (1983) 
and Friedman, Furberg and DeMets (1998). In addition to treatment allocation and 
outcome values, baseline covariate information is often collected in such clinical stud- 
ies. Classical analysis of covariance (ANCOVA) and other regression model-based 
tests may be used to handle covariate adjustment; cf. Scheffe (1959), Simon (1984), 
McCullagh and Nelder (1989) and Rutter and Elashoff (1994). When properly used, 
covariate adjustment can increase efficiency and, in the case of an observational study, 
reduce bias (Armitage 1981). 

Due to randomization, most two-sample (multi-sample if more than two treatment 
groups are involved) tests are valid without any parametric assumption. Therefore, 
these tests are nonparametric in nature, a feature of great importance in a clinical 
trial. Standard methods for covariate adjustment, however, require that a specific 
regression model be assumed; see, for example, Piantadosi (2005, Chapter 17). 

Adjusting for covariates without assuming a regression model has been studied by 
Koch (1998), Tsiatis, Davidian, Zhang and Lu (2008) among others. In particular, 
Koch (1998) proposed a weighted least squares method to include covariate informa- 
tion for estimating the treatment difference. This method always leads to a variance 
reduction, thus an increase in power. By appealing to semiparametric efficiency the- 
ory, Tsiatis et al. (2008) developed a general approach to covariate adjustment that 
circumvents modeling the covariate-outcome relationship. Their approach allows for 
nonlinear terms in relating the auxiliary covariates to the outcome variable, thereby 
further reducing the variability. They showed that the method is semiparametrically 
efficient by deriving the semiparametric information bound and by showing the bound 
is attained with their approach. 
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An essential ingredient in the approach by Tsiatis et al. (2008) is the use of the 
independence of treatment allocation and baseline covariates to construct equations 
associated. These equations can be viewed as constraints that, when properly utilized, 
may lead to further reduction in variability of the outcome variable. How to optimally 
use these constraints is therefore crucial for efficiency improvement. 

Empirical likelihood (Owen 1988) is a general method for efficiently utilizing con- 
straints or estimating equations. Specifically, it maximizes the nonparametric like- 
lihood (Kiefer and Wolfowitz 1956) subject to certain constraints that are specific 
to the problem of interest. It can be used to obtain empirical likelihood ratio tests 
as well as confidence intervals. Examples include testing and interval estimation for 
population means and for regression coefficients. Qin and Lawless (1994) showed 
that the constraints can be used more liberally in the sense that the number of con- 
straints may exceed the number of parameters of interest. They also showed that the 
empirical likelihood utilizes the information in the constraints in an optimal way. 

Because baseline covariate information for a randomized clinical trial generates 
constraints, it is natural to consider the empirical likelihood as a means to improve 
efficiency for the primary problem of testing and estimating treatment difference. To 
that end, this paper proposes a general approach to covariate adjustment by making 
use of the empirical likelihood and suitably choosing constraints. The new approach 
does not require any model assumption on the relationship between the outcome 
variable and baseline covariates. It is shown that such an empirical likelihood based 
method automatically results in efficiency improvement. For testing, it is asymp- 
totically most powerful; for estimation, it achieves the semiparametric information 
bound. 

The rest of the paper is organized as follows. In Section 2 we introduce some 
notation and briefly discuss existing model-based methods. We apply the empirical 
likelihood method for covariate adjustment and extend it to inference with growing 
number of constraints in Section 3. The design and results of simulation studies 
are described in Section 4. In Section 5, the method is applied to a study of acute 
myocardial infarction. Some concluding remarks are given in Section 6. 



2. NOTATION AND MODEL SPECIFICATION 

In a (K + l)-arm [K > 1) randomized clinical trial, for subject i, let Yi, Zi and Xi 
denote the outcome, treatment allocation and available auxiliary baseline covariates, 
respectively. Assume that (Yi, Zi, X{), i — 1, . . . ,n, are independent and identically 
distributed (i.i.d.) and that the random allocation probabilities 7T& = P(Z = k), 
k = 0, . . . , K, where Y^k=o n k = 1> are known. 

Throughout, G k denotes the conditional distribution of the outcome variable Y 
given treatment allocation Z = k, k = 0, . . . , K. Then the usual null hypothesis of 
no treatment difference is given by 

U . ryQ _ _ r~iK 

Hq . (j — (_r — ... — Lt 

Note that there is no assumption on the form of {G k , k = 0, . . . , K}. 
To study treatment effects, one may choose certain contrasts among the treatment 

groups in terms of their population characteristics, for example, the difference in 

mean outcomes between two treatment groups. Following Zhang et al. (2008), the 

treatment effect can be identified by considering 

(1) 0i = E(Y\Z = 0), p 2 = E(Y\Z = 1) - E(Y\Z = 0), 

or equivalent ly, by formulating 

(2a) E(Y\Z) = f3 1 + (3 2 Z. 

Clearly, such an approach does not require model assumption on the underlying 
distribution functions G k , k = 0, . . . , K. If there are more than two treatment groups, 
equation (j2a|) becomes 

(2b) E(Y\Z) = (3 l + /3 2 l (z =i) + • . • + Pk+iI(z=k), 

where 1(.) is the indicator function and Pk+i represents the difference in mean outcome 
between group k and group 0. For a binary outcome, an alternative formulation is 
via the log-odds ratios: 

(3) logit{P(F = 1\Z)} = log =0 1 + /3 2 i (z=1) + . . . + K+1 1 (Z=K) . 
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Under this formulation, testing the null hypothesis of no treatment difference is tan- 
tamount to testing H : f3 2 = ■ ■ ■ = Pk+i = 0, and estimating the treatment effect is 
tantamount to estimating values of the (3k, k = 2, . . . , K + 1. For notational conve- 
nience, we use (3 to denote the parameter vector (j3i, . . . , (3k+i) t '■ 

Besides the outcome variable and treatment assignment, relevant baseline co- 
variates, which may comprise patients' demographic information, medical history, 
lifestyle measurements, etc., may be recorded as well. Their association with and 
impact on the outcome variable can then be explored for efficiency gains in testing 
and estimation of treatment effects. A common approach to adjusting for covariates 
is to postulate a certain regression model, which gives treatment comparisons condi- 
tional on values of the covariates. It is well known that treatment effects may have 
different interpretations in conditional and unconditional (on covariate value) models. 
Indeed, except for linear and exponential regression models, the conditional and un- 
conditional approaches generally lead to different parameter values for the treatment 
effect. We refer to Gail (1984) for a comprehensive discussion on the subject. 

Since the unconditional treatment effect is of primary interest here, it is natural for 
us to avoid any modeling of the relationship between the outcome variable and base- 
line covariates. Yet it is also desirable that we make best use of the information in 
the covariates to improve efficiency To this end, we explore the empirical likelihood 
methodology to develop a model- free approach to covariate adjustment. We demon- 
strate that such an approach is natural for nonparametric covariate adjustment and 
optimal in terms of efficient use of available information. 

3. EMPIRICAL LIKELIHOOD BASED METHODS FOR 
NONPARAMETRIC COVARIATE ADJUSTMENT 

Being first implicitly used in Thomas and Grunkemeier (1975), empirical likelihood 
was developed into a general methodology by Owen (1988, 1990). Given (Fj, Zj, Xj), 
i = l,...,n, assumed to be independent with a common cumulative distribution 
function (CDF) Fq, the empirical likelihood function is a nonparametric likelihood 
function of the CDF F 
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n 



n 



(4) 



L(F) = Y[dF(y i ,z i ,x i ) = ]ln 




where (yi,Zi,Xi) is the observed value of {Y^Z^ Xi), pi = dF(y iy z^xi) = P(Yi = 
yi, Zi = Zi, Xi = x^, i — 1, . . . , n. Without additional constraints (other than Pi > 
and YH=iPi = 1)> ft i s weu known that the empirical distribution function is the 
nonparametric maximum likelihood estimate of F . 

This section is devoted to the development of an empirical likelihood based method 
for nonparametric covariate adjustment arising from a typical randomized clinical 
trial. Subsection 3.1 develops an empirical likelihood ratio based test and establishes 
its asymptotic properties. The subsequent subsection deals with the dual problem of 
estimating treatment effects via maximizing the empirical likelihood with the number 
of constraints exceeding the number of parameters. Subsection 3.3 extends the results 
of 3.1 and 3.2 to the situation in which the number of constraints increases with 
the sample size. Asymptotic normality and Wilks type y 2 approximation as well 
as asymptotic efficiency are established for all the cases under suitable regularity 
conditions. 

3.1 Testing Treatment Differences 

Empirical likelihood methodology for inference is based on maximizing the nonpara- 
metric likelihood fll]) subject to appropriately formulated and problem-specific con- 
straints. For the two-arm randomized clinical trial specified by f )2a|) . the constraints 
are generated by 



m((3; Y, Z) = (1, l ( z=i), • • • , 1{z=k)) T {Y ~ Pi ~ Ail(z=i) - • • • - P(k+i)\z=k))- 



The zero-mean property of m(f3; Y, Z) uniquely determines the value of (3 and can be 
used to obtain estimators through the sample-generated estimating equations. The 
resulting inference involves only the Yi and Zj. 



(5a) 



m((3;Y,Z) = (l,Z) T (Y-p 1 -f3 2 Z). 



For general K specified by f )2b|) . it becomes 



(5b) 



The availability of the baseline covariates Xi should enable us to obtain addi- 
tional estimating equations, thereby additional constraints. Indeed, Davidian et al. 
(2005) and Leon et al. (2003) found that the following form gives a general family of 
estimating equations: 

K 

(6) 7^(1 (z=fc) - TTk)hk(X), 

k=0 

where hk, k = 0,1, ... ,K are arbitrary functions. The independence of Z and X 
guarantees the zero-mean property of the resulting estimating equations. 

It is clear now that the number of zero-mean estimating equations as provided by 
(151) and exceeds the number of parameters which specify the treatment effect. 
In fact, the number of possible equations that can be generated from can be 
unlimited when the baseline covariates X are continuous. Suppose we fix the choice of 
hk and consider how to make use of them for efficiency improvement. For notational 
simplicity, we use g r ((3; Y, Z, X) to denote an r-vector of the resultant estimating 
equations that include both (jSJ) and (jHJ). Here r > 2 in the two-sample case and 
r > K + 1 for the general (K + l)-sample case. 

It is well known that the empirical likelihood approach links together the inference 
of certain parameters and the available estimating equations to form a constrained 
optimization problem. With constraints given by g r , it maximizes L(F) in (j3J) subject 
to the following constraints: 

n n 

(7) Pi >0, £> = 1, ^2p t g r {j3]Y i ,Z ii X i )=0. 

i=l 8=1 

This optimization problem has a unique maximizer provided that is inside the 
convex hull of {g r (/3; Vi, z iy x t ),i — 1, . . . , n} for a given /3 (Owen 2001). By apply- 
ing the Lagrange multiplier argument (Lang 1987), we can easily get Pi = {n[l + 
A ((3)g r ((3; yi, Zi, a^)]} -1 , where A, which is a function of (3, is the solution to 



nHi , \ T ^^^ fa... „ ^. ^ ~ 



8=1 



1 + A (P)g r (f3;yi,Zi,x 



Therefore, the resulting profile empirical log-likelihood, as a function of (3, takes form 

n 

(9) l E (P) = lo S l 1 + A T (/3)9r(/3; Vi, Zi, Xi 



i=l 



Theorem 3.1. Let f3 T = (0^,0^), where (3 1 and (3 2 are qi- and q2-vectors. Define 
(10) T E = 2l E @ 10 ,0)-2l E @), 

the logarithmic empirical profile likelihood ratio for testing Ho : (3 2 = 0, where (3 10 
minimizes /^(/3 1 ,0) with respect to f3 1 and (3 minimizes Ie{0)- Then, under some 
mild regularity conditions, Te converges to x\ q , 2 ) i> n distribution under H . 



Theorem 13.11 is a direct adaptation of Corollary 5 in Qin and Lawless (1994). It 
enables us to get the p-value in testing the null hypothesis of no treatment difference 
and to invert the test to obtain the confidence limits. A numerical way to find f3, 
and similarly for /3 10 , is to use a two-stage Newton algorithm. We first specify an 
initial value O) for (3 and solve © to obtain A(/3 (0) ). Next, we fix X((3) in © at 
X(0 ^) and minimize over (3 to obtain a new value l \ We iterate the process 
until convergence. 

From Qin and Lawless (1994), it follows that the empirical likelihood ratio test 
incorporating covariate information through constraints g r (J3', Y, Z, X) is always more 
powerful than the one with m(f3; Y, Z) only. Moreover, the more constraints we 
put into g r , the more powerful the test becomes. Because the net effect of the 
empirical likelihood method with more constraints than parameters is an optimal 
linear combination of the constraints, choice of additional constraints should therefore 
be made to avoid redundancy. However, it is not necessary to model the relationship 
between the covariates and the outcome, as is evident from equation (jSJ); this is a 
very desirable feature with important practical implications. 

For a binary outcome variable, if we are interested in using the log-odds ratio, then 
we can replace (f5b|) with 

m(/3; Y, Z) = (1, l (z=1) , . . . , 1 (z =k)) T [Y - + Pi\z=x) + ■■■ + %+i)l(z=^))], 

where </>(•) = exp(-)/[l -|-exp(-)] is the logistic function. We can then follow the same 
steps to construct the empirical likelihood ratio test. As before, the large sample 
properties given by Theorem 3.1 continue to hold. 
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3.2 Maximum Empirical Likelihood Estimate of Treatment Effect 

Without adjusting for baseline covariates, the number of estimating equations, de- 
rived from the score functions, equals the number of parameters. Solving equations 
Y^7=i m (/3; y~i, Zi) = gives us the M-estimator for (3, which is known to be consis- 
tent and asymptotically normal (Huber 1981). With covariate adjustment, we have 
additional estimating equations containing auxiliary information through Since 
the number of all available estimating equations r exceeds the number of parameters 
q = qi + q2, we cannot obtain the estimators simply by finding zeros of those estimat- 
ing equations. One way to handle overly constrained problem is to form g-dimensional 
linear combinations of all available estimating equations so that the resulting set of 
equations has a unique solution. One can further evaluate the limiting covariance 
matrix of the estimator to identify the optimal choice of such linear combinations; cf. 
Goldambe and Heyde (1987). Because the empirical likelihood method with overly 
constrained estimating equations can result in the optimal combination (Qin and 
Lawless 1994), it provides a nature alternative. The following result follows directly 
from Qin and Lawless (1994). 

Theorem 3.2. Let D r = E[dg r (f3 )/df3 T ] and S r = E(g r g^). Then, under certain 
regularity conditions, we have 

(ii) ™ 1/2 (3 - A,) n(o, (d^a.)- 1 ), 

where (3 is the maximum empirical likelihood estimate (MELE). 

The theorem above allows us to construct Wald-type confidence intervals using 
the robust variance estimate. From Corollary 2 of Qin and Lawless (1994), it fol- 
lows that (3 has the smallest asymptotic variance among all the g-dimensional linear 
combinations of g r ({3; Y, Z, X). In particular, when r = q, the maximum empirical 
likelihood estimator (3 will be asymptotically equivalent to the M-estimator. Fur- 
thermore, Corollary 1 of Qin and Lawless (1994) ensures that the more constraints 
being put into the optimization problem, the more precision one can achieve. 

As an example, consider again a two-arm clinical trial with a binary outcome 
variable and a continuous covariate X, and suppose the log-odds ratio is of interest. 
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We can incorporate both linear and quadratic terms of X by using constraints 

g r (fr Y, Z, X) = ((1, Z)[Y - 0(A + (5 2 Z)l (Z - m), (Z - m)X, (Z - n 1 )X 2 Y. 

The resulting estimator will be more efficient than the M-estimator from (1, Z) T [Y — 
(f)((3i + P2Z)}. Note that, for regression model based covariate adjustment, Robinson 
and Jewell (1991) demonstrated that including predictive covariates in the logit will 
always result in a loss of precision. In contrast, for our empirical likelihood approach, 
including predictive covariates in the constraints will never lead to an increase in 
the asymptotic variance. The fact that incoporating additional estimating equations 
always improves efficiency makes the empirical likelihood approach advantageous and 
convenient. 

3.3 Empirical Likelihood With Growing Number of Constraints 

Since we can achieve more precision by increasing the number of constraints, it is 
intuitive that semiparametric efficiency may be attained when the number of con- 
strains grows with the sample size. In this connection, we consider in this subsection 
the empirical likelihood based covariate adjustment when the number of constraints 
grows to infinity as n — > 00. Note here that the dimension of (3, which is of primary 
concern, remains fixed. 

Suppose besides the g-dimensional score m(/3;Y, Z), the auxiliary information is 
contained in an r n -vector of estimating equations g* n (fl) = (m T (f3;Y, Z),V^) T '. 
Instead of a fixed number r, r n here will grow to infinity with n at a certain rate. The 
jth com ponent of V n has the form (l(z=fe) — ^k)hj(X) for j — 1, . . . , r n — q, where hj 
is a real-valued function. The following conditions will be used. 
(CI) There exists a non-random (r n — q) x (r n — q) matrix W n such that (i)-(iii) 
below are satisfied for g rn (/3) = (m T (/3; Y, Z), (W n V n ) T ) T . 

(i) Components of g ni , % = l,...,n, are uniformly bounded by a finite constant 
M > 0, where g n ^(3) = g rn {f3; Y h Z h X,). 

(ii) Eigenvalues of S„ i9 = E(g rn (f3 )gJ n (f3 )) are bounded away from zero and infinity. 
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(iii) There exists a q x (r n — g) non-random matrix A n such that 

K 

A n W n V n -> £(l ( z=*) - 7r fe )£;(m(/3; Y, Z)|Z = fe, X) in L 2 . 

fc=0 

(C2) The growth rate of r n is limited to = o(n). 
(C3) Matrix S = E(mm ) is positive definite, where 

A' 

m = m(/3; F, Z) - £(l(z=*) - 7r fe )£(m(/3; Y, Z)\Z = k, X). 

k=0 

Theorem 3.3. Let (3 n be the maximum empirical likelihood estimate based on con- 
straints g* (/3) and D m = E(dm(f3 ) / d(3 T ) . Then, under Conditions C1-C3, 

(12) n^iX - f3 ) -> Nfo, {D T jr l D^- 1 ) . 

Minimizing the asymptotic variance of the M-estimator from the class of arbitrary 
g-dimensional unbiased estimating equations, Zhang et al. (2008) derived the semi- 
parametric efficiency bound for the estimators of treatment effect. From Zhang et al. 
(2008) and Theorem I3.3[ we have the following result. 

Corollary 3.4. The limiting variance-covariance matrix, {D m Yi Dm)' 1 , achieves 
the semiparametric efficiency bound, i.e., (3 n in Theorem \3.3\ is asymptotically effi- 
cient. 

In practice, in order to construct the Wald type confidence interval for (3 , we need 
to estimate the asymptotic variance expressed in f[T2l . Let g~ n (0) = n~ x ^ILi 9 n i(P)i 
S n ({3) = n^YJUa^Mg^M and D((3) = dg n ({3)/d{3 T . Theorem EE below 
shows that a consistent estimate of the limiting variance-covariance matrix of n l l 2 ((3 n — 

p ) is [b^s^iXmh)]' 1 - 



Theorem 3.5. Under Conditions C1-C3, 



Op(l). 



Throughout, ||-|| is used to denote the Euclidean norm. Theorem 13 . 3 1 st at es that the 
listed conditions are sufficient to ensure standard asymptotic properties of the MELE. 
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Moreover, Corollary 13.41 states that when the number of constraints grows to infinity 
at a certain rate, the MELE achieves the semiparametric efficiency as derived in 
Zhang (2008). In Theorem 13.31 g rn is essentially a linear transformation of g* . Since 
a linear transformation does not change the constraints, the estimator using g Tn will 
be the same as that using g* . The fact that the MELE will not be affected by a linear 
transformation of the constraints greatly facilitates the applicability of the empirical 
likelihood approach because we can just throw in all the constraints we have without 
forming the appropriate combination of them. For example, E[g* Tn (g* rn ) T ] might be 
ill conditioned but we can still use it as long as there exists a W n such that the 
corresponding S„ i9 is better conditioned. For this reason, we will not distinguish 
among linear transformations of constraints in the following discussion. 

Theorem 13.31 holds for a general g-dimensional score m as long as some regularity 
conditions in the case of fixed number of constraints (Qin and Lawless 1994) are 
satisfied, including E^dm(f3; Y, Z)/df3 T ^j is of full rank p, \\dm(/3] Y, Z)/df3 T \\ and 
||<9 2 m(/3; Y, Z)/df3d(3 T \\ can be bounded by some integrable function in a neighbor- 
hood of /3 and dm(f3;Y } Z)/df3 and d 2 m(f3;Y, Z)/df3df3 T are continuous in this 
neighborhood. 

Condition C2 imposes an upper bound on the growth rate of the number of con- 
straints at which a well-behaved MELE can be obtained. In practice, the number 
of constraints need not be large. In fact, we find that additional gain by including 
an extra constraint diminishes quickly, due to the optimal use of constraints by the 
empirical likelihood method. It is important to note that the asymptotic normality 
and efficiency are not affected by the choice of r n , as long as it satisfies C2. It is 
certainly of theoretical interest to find the sharp upper bound for r n to grow such 
that the resulting estimate is still asymptotically normal and efficient. But we will 
not get into this complication here since finding the optimal rate is not our main 
concern. If we knew the conditional expectations in Condition C3, the optimal es- 
timating equations m would be the constraints that lead to the optimal estimator. 
Although they are unknown in practice, it is clear that Condition C3 is fairly mild. 
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For Condition CI, we need to make use of the orthogonality and boundedness of cer- 
tain basis functions to properly design h(X) in the constraints. Suppose Z = 0, 1,2 
and the empirical CDF of the one dimensional auxiliary covariate X is F n {x) = 
77,-1 Y^i=i ^{Xi<x}- By making use of multivariate Fourier expansion, the arguments 
can be generalized to the high dimensional auxiliary covariate case. Let g* Tn {(3) = 
(m T (f3; Y, Z), (l x -tti), s n , c u , . . . ,s ldn ,c ldn , (1 2 -tt 2 ), s 21 , c 2 i, . . . , s 2dn , c 2dn ) T , where 
lfc = l(z=fc), r n = 4:d n +q+2, Sij = (1—71-*) sin(27rjF n (X)), c^- = (1— 7T;) cos(27rjF n (X)), 
i — 1, 2, j — 1, . . . , d n . It can be shown that, when d n = o(n 1//4 ), (i)-(iii) are satis- 
fied. For example, we can apply the fact that those basis functions are orthogonal 
when their arguments are U[0, 1] and they are bounded to show (i) and (ii) hold. 
Because the procedure is invariant under linear transformations, the eigenvalues can 
grow with n if all of them grow at the same rate. However, we do not believe in 
general they can grow at different rates since the covariance matrix is sandwiched in 
the variance-covariance expression, which needs to be well-conditioned. Furthermore, 
(iii) can be verified by taking the expansion of the conditional expectations. Likewise, 
we may apply other orthogonal basis functions that are bounded. For example, we 
can use the Legendre polynomials of (2F n (X) — 1) which are bounded by 1 on [-1,1]. 
Legendre polynomials, i.e. l,x, (3x 2 — l)/2, . . ., are linear transformations of polyno- 
mial terms l,x,x 2 , . . .. Therefore we can also use polynomial terms of (2F n (X) — 1) 
in the auxiliary constraints due to linear transformation invariance of the empirical 
likelihood. As pointed out by a referee, the standard independence assumption for 
empirical likelihood is violated due to the plug-in estimator F n . Intuitively, the valid- 
ity of using F n instead of F relies on the fact that those constraints are still zero-mean 
conditioning on all the covariates. A rigorous proof can be found in the Appendix. 

Analogous to the case with a fixed number of constraints, let l(/3) = Yli=i 1°§ + 
\ n ((3)g ni ((3) j. The empirical likelihood ratio statistic for testing H : (3 = (3 is 

(13) T ln = 2l(f3 o )-2l0 n ). 

Then under Conditions C1-C3, the Wilks type theorem of convergence to the x 2 
distribution is still valid for testing the null hypothesis of no treatment effect. 
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Theorem 3.6. Suppose that Conditions C1-C3 are satisfied. Then, under the null 
hypothesis H Q , T\ n converges in distribution to x\ q \ as n ^ oo. 

More generally, we can test hypothesis on a subset of treatment effects (3 instead 
of all components of it. For instance, we may be interested in testing whether (3 2 = 
in the simple example ff2a| . Specifically, let f3 T = ((3 7 ,(3 2 ) T , where (3 1 and /3 2 are 
qi- and q 2 - vectors, respectively. For H : f3 1 = f3 w , the profile empirical likelihood 
ratio test statistic is simply 

(14) T 2n = 2/(/3 10 ,3 20 )-2/(3j, 

where (3 20 minimizes /(/3 10 ,/3 2 ) with respect to j3 2 . The following result shows that 
a Wilks type \ 2 approximation still holds. 

Corollary 3.7. Suppose that Conditions C1-C3 are satisfied. Then, under the null 
hypothesis, T 2n converges in distribution to xt qi ) as n ~ * 00 ■ 

Auxiliary information can be used to not only increase the precision of estimated 
treatment effects, but to also increase power in hypothesis testing. To evaluate power, 
we need to derive the asymptotic distribution of the test statistic under the alternative 
hypothesis. We shall consider the contiguous alternative which deviates from the null 
by the order of 0(n~ 1//2 ); cf. Hajek, Sidak and Sen (1999) and Serfling (1980). For 

rp — 1 

notational convenience, let A = D, m S D m and write 

An A 12 

A = 

A 2 i A 22 

where A €j = E(dm T (/3 )/d/3 i )^ 1 E(dm((3 )/df3j), i = 1,2 and j = 1,2. 

Theorem 3.8. Suppose that Conditions C1-C3 are satisfied. Then under the se- 
quence of contiguous alternatives H a : (3 = {3 a = f3 + h/y/n, the empirical likelihood 
ratio test statistic T ln converges in distribution to a noncentral x 2 with degrees of 
freedom q and noncentrality parameter h 7 Ah. 

Similarly, the noncentrality parameter of the limiting \ 2 distribution becomes the 
projected Fisher information when there are nuisance parameters. 
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Corollary 3.9. Under the same assumptions as those in Theorem \3.S\ and with H a 
replaced by H a : (3 1 = (3 la = f3 w + hi/y/n, the empirical likelihood ratio test statistic 
T2 n in [Lfy converges in distribution to a noncentral y 2 with degrees of freedom q\ 
and noncentrality parameter h\(An — Ai 2 A 22 1 A 2 i)^i. 

It can be seen that the empirical likelihood approach reproduces the standard 
asymptotic results in parametric likelihood theory (Cox and Hinkley 1974). Similar 
to the estimation problem, adding more constraints will result in more powerful tests. 
When the number of constraints goes to infinity, the corresponding tests become 
asymptotically most powerful. 



4. NUMERICAL STUDIES 

In this section, we discuss computational issues arising from implementing the 
constrained optimization problems and report simulation results associated with the 
empirical likelihood based covariate adjustment method. 

The primary step in computing the empirical likelihood is to maximize (T4j) subject 
to constraints ([7]). The lagrangian is 

n n n 

P*(p,/3, A, 7) = ^log^pi) +n\ T ^2p i g r (f3;y i ,z h x l ) +n-f(£2pi - 1), 

i=i i=i i=i 

where A and 7 are the Lagrange multipliers and log* is a modified natural logarithm 
defined in Owen (2001). Thus, we obtain estimators for p and (3 by differentiating 
P* with respect to p, f3, A and 7 and setting them to 0. 

Working directly with n + q + r + 1 free variables involves gradient and Hessian 
matrices of daunting dimensions. Alternatively we may use the two-stage Newton 
algorithm as discussed in Section 3.1 that can eliminate some parameters. Nonethe- 
less, unlike the usual testing case where (3 is fixed at /3 , the outer stage in the 
two-stage Newton algorithm, i.e. minimization over (3 while keeping A fixed, is dif- 
ficult in practice because of the possibility of a non-positive definite Hessian ma- 
trix. Zedlewski (2008) points out that "Concentrating out some parameters leads 
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to a smaller optimization problem, but it can make it more difficult. Thus the two- 
stage Newton algorithm is fast but unreliable and can lead to frustrating convergence 
problems. In most cases n is much greater than q + r, so the largest block of the 
Hessian is an n x n diagonal matrix." . In our implementation, we use a Matlab pack- 
age "matElike" , which solves the primal problem by including modern optimization 
codes exploiting matrix sparsity. We find the package to be both robust and fast. 
The link to the Matlab package and the code to implement our method can be found 



at http : //www. stat . Columbia. edu/~xwu/sof tware .html 



4.1 Estimation 

The simulation results reported below are all based on 5000 Monte Carlo replications. 
The sample size is chosen to be 200 throughout. We consider the case of two treatment 
groups with the treatment indicator Z generated with P(Z = 0) = P(Z = 1) = 0.5. 
The response variable Y is binary with logit-f-E^Y \Z)} = (5\ + The parameter 
of interest is either f3 = (ftx, (3 2 ) T or (3 2 . 

In the first scenario, the auxiliary covariate X is generated as a one dimensional 
Normal random variable with mean and different variances. The magnitude of the 
variance correlates with the influence of X on the response. Given Z and X, Y is 
then generated as Bernoulli according to logit{P(F = 1\Z = g,X)} = a 0g + a g X, 
where a o = 0.3, a i = 1, a = 1, «i = 1.5 and g — or 1. 

From Table [TJ we see that when the standard deviation of X is 2, the Monte Carlo 
standard errors gradually decrease and approach the optimal ones. From "marginal" 
to "5 Fourier" , the standard errors drop significantly. However, additional constraints 
beyond "5 Fourier" do not appear to have much impact on further variance reduction. 
Note that a large number of additional constraints require substantially more com- 
puting time. Thus, we will only compare the results of "marginal" with "5 Fourier" 
in the other cases. A single (i.e., nonparallel) process that calculates the maximum 
empirical likelihood estimate and the p-value for testing the null hypothesis of no 
treatment difference takes, on average, less that 2 seconds to run for a data set of 
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200 samples using 5 constraints. The computation time is estimated using a 2.33GHz 
processor on a server with 8GB RAM. 

Table [T] also shows that the means of Monte Carlo estimates differ from the true 
value of (3 at the third decimal place and the coverage probabilities are around 0.95. 
The Monte Carlo standard errors of estimates from five estimating equations are 
generally smaller than those from marginal models. The improvement becomes more 
pronounced when the variance of X becomes larger. Also, the average length of 95% 
Wald confidence intervals are smaller than those of marginal models. 

In the second scenario, the link function is quadratic in X, i.e., logit{P(Y = 1\Z = 
g, X)} = ao g + a g X 2 , with the same a$g and a g values, g = 0,1. From Table El we 
see that the coverage probabilities are satisfactory and close to their nominal levels 
as in the first scenario. The biases are slightly larger, however, they are still small 
relative to the standard errors. As expected, the Monte Carlo standard errors and 
the average lengths of 95% Wald confidence intervals from five estimating equations 
are smaller than those from the two marginal ones. 

In the third scenario, there are two auxiliary covariates X\ and X2 and the response 
Y is generated as logit{P(Y = \\Z = g,X)} = a 0g + cei g Xi + a 2g X 2 , g=0,l, with 
«oo = 0.3, «oi = l,ctio = l,an = 1-5, 0:20 = 2,a2i = 1.5. The estimating equations 
for the marginal method remain the same since there is no covariate adjustment 
involved. Let k(Z) = \/2(2Z — 1) and Wk = 27rF ri (Afc), k — 1,2. The empirical 
likelihood method with constraints, k(Z), k(Z) sinfWi), n(Z) cos(W / i), k(Z) sin(iy 2 ), 
k(Z) cos(W2), except the marginal estimating equations is denoted by "7 Fourier". 
From Table |3l the performance of the estimates is similar to the previous two scenar- 
ios. 

4.2 Testing 

With the same data generating process as in the preceding subsection, the corre- 
sponding hypothesis testing results are presented in Tables HJ |5] and El In each 
scenario, the profile empirical likelihood ratio test is used to test the null hypothesis 
Hq : 02 = 0. CovProb denotes coverage probabilities for testing (3 2 = ^20 • We have 
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the following observations. First, in all three tables, both coverage probabilities of 
the profile empirical likelihood ratio tests are close to the nominal 95% level. Second, 
the attained power from 5 estimating equations is larger than that from marginal 
estimating equations. Third, when X is one dimensional, the gain in power is more 
significant as the standard deviation of X increases. 

5. APPLICATION 

We apply the proposed empirical likelihood based approach to the Global Use of 
Strategies to Open Occluded Coronary Arteries (GUSTO)-I trial data, which were 
kindly provided to us by Karen Pieper from the Duke Clinical research Institute. 
The primary endpoint was 30-day death, which occurred in 6.29% of 10366 patients 
randomly assigned to tissue plasminogen activator (TPA) (g=l), 7.32% of 10354 pa- 
tients randomly assigned to skreptokinase (SK) with IV heparin (g=2), 6.99% of 
10303 patients randomly assigned to a combination of SK and TPA (g=3) and 7.24% 
of 9773 patients randomly assigned to SK with SQ heparin (g=4). Besides treatment 
assignment and outcome, some baseline auxiliary covariates concerning demographics 
(age, sex, weight, height), risk factors (hypertension, diabetes, smoking, hypercholes- 
terolemia), other history (family history of MI, previous MI, previous angina, previous 
revascularization) and presenting characteristics (blood pressure, tachycardia, ante- 
rior infarct location, killip class, ST elevation on electrocardiography) were recorded 
on each subject. In Steyerberg et al. (2000), the relative prognostic strength of 17 
baseline covariates was evaluated by their univariate x 2 model, which was calculated 
as the difference in -2 log-likelihood between a univariate logistic regression model 
with and without the characteristic. The strongest prognostic factor was age and this 
was further confirmed by the R 2 measure on the log-likelihood scale, which approxi- 
mately indicated the percentage of variance explained. Except for the calculation of 
correlation, adjustment for important predictors such as age is always recommended 
in the case of short-term death after acute myocardial infarction. Thus, we will 
compare unadjusted and age-adjusted results for the four treatment groups. 
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The marginal model between the 30-day death (Y) and treatment assignment (Z) 
is given by \ogit{E(Y\Z)} = ft + f3 2 l {z=2) + /3 3 1 {Z=3) + ftl ( z=4)- For the age(X) 
adjustment, we use 9 auxiliary constraints (l(z= 9 ) — 0.25), (l(z= fl ) — 0.25)F n (x) and 
(l(z= g ) — 0.25)F^(x), (/ = 2,3,4, where F n (x) is the empirical c.d.f. of age. 

The unadjusted estimates (ft, ft, ft, ft) are (-2.7014, 0.1630, 0.1129, 0.1517) with 
standard errors (0.04109, 0.05619, 0.05670, 0.05557). Estimates adjusted for age are 
(-2.7014, 0.1628, 0.1126, 0.1521) with standard errors (0.04109, 0.05619, 0.05670, 
0.05556). The p- values for the unadjusted and adjusted hypothesis testing of ft = 
P>% — 1^4 = are 0.0136 and 0.0135, respectively. 

The unadjusted test is already significant, so the additional improvement in p- value 
after covariate adjustment only reconfirms the scientific conclusion. However, if the 
sample size were smaller, the change in p-value might be more consequential. For 
illustrative purposes, we randomly draw a subsample of size 20000 from the complete 
data and pretend that is what we had in reality. In one of these cases, the p-values 
for the unadjusted test of ft = ft = ft = is 0.0391 while it becomes 0.0362 after 
adjusting for age. In another case, it changes from 0.0508 to 0.0458. 

6. DISCUSSION 

Nonparametric covariate adjustment is of importance in analysis of randomized 
clinical trial data. When properly done, it can result in efficiency improvement while 
maintaining the nonparametric nature of the usual tests. Empirical likelihood ap- 
proach is nonparametric, constraint based and efficient in extracting information from 
data. 

For randomized clinical trials, covariate information with no model assumption 
can be extracted from certain type of constraints or estimating equations. We pro- 
pose an empirical likelihood based approach for covariate adjustment. The resulting 
likelihood ratio test is shown to have the usual Wilks type x 2 approximation, with 
increased power as the number of constraints increases. The corresponding max- 
imum empirical likelihood estimate also enjoys similar asymptotic properties. We 
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demonstrate that the \ 2 an d normal approximations continue to hold as the num- 
ber of constraints grows with sample size. We further show that in doing so the 
semiparametric efficiency can be achieved. 

One of the practical issues is how to select basis functions in the constraints. From 
our experiences with simulations and real data analysis, it appears that there is no 
universal way to deal with this issue. A related issue is how many basis functions 
should be used. One ad hoc way to do that is to consider variance reduction when 
additional constraints are added. We believe that if initial basis functions are properly 
chosen, then only a very small number of constraints will be needed. 

It will be of interest to extend this empirical likelihood based nonparametric co- 
variate adjustment to other situations, including observational studies. Of particular 
importance are the survival and longitudinal studies where the response variables may 
be dependent or causal. For survival data, Lu and Tsiatis (2008) have introduced 
a general model framework for covariate adjustment and derived a semiparametric 
efficient score. We believe a similar approach, which makes use of suitable covariate 
based constraints and achieves the asymptotic efficiency, can be developed. 

APPENDIX 

Here we provide proofs of the theoretical results presented in the previous sections. 
For not at ional convenience, let G n {(3) = max ||gf n S n>m = E(m rn (f3 )m^ (/3 )), 

S n , opt = £(m£(/3 o )(m£09 o )) T ), D rn ~=E(dg rn ((3 )/d(3 T ), D mn = E(dm rn ((3 )/d(3 T ), 
and = £(dm^(/3 )/d/3 T ). 

Lemma 1. The probability that zero is outside the convex hull spanned by {g n i ,i = 
1, . . . , n} goes to zero as n — > oo. 

Proof. This follows from Lemma 4.2 in Hjort et al. (2009) and discussions thereof. 

□ 

Lemma 2. Under (i),(ii) and C2, the eigenvalues of S n (f3 ) are bounded away from 
and oo. 
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Proof. It can be shown by making use of proofs of condition (D4) and Lemma 4.5 
in Hjort et al. (2009). □ 



Lemma 3. Under (i),(ii) and C2, 



(15) 
(16) 

(17) 



sup 

ll/3-/3 ll<n- 1/3 



*n(A>) 

K((3) 



sup 

[|/3-/3 ||<n-V3 



\ n {(3) ~ S n {(3)- l g n {(3) 



P {n- l,2 rT) 
O p (n-^) 



Proof. Under (i),(ii) and C2, we can apply results in Portnoy (1988) to get 



(18) 

Under (i) 
(19) 



7? 



l,2 9nW\\=0 P {r 1 J 2 )- 



G n (f3) < Mr]l> = O p (ry 2 ). 



Write A n (/3) = A n (/3) u n (/3), where = 1. Then similar to jSJ), we can 

show that 



9nAP) 



n tl 1 + K(P)9n,M 



< u T n {(3)g n {(3y 









1 + 




G n ((3) 



mineig(S n ((3)), 



where mineig(M) stands for the minimum eigenvalue of the matrix M. Therefore, 
we have 



(20) 



A n (/3) (mineig(S n {(3)) - u T n {f3)g n {l3)G n {(3)) < u T n {(3)g n {f3), 



from which we know that (|15p holds due to ffl8l) , ffT9|) and Lemma [2j 

When — /3 || < 77 -1 / 3 , define 
(21) L n = max \S njj>k ((3) - S nJ . k (f3 )\ . 

Using the same technique as in Lemma [2J r n L n = o p (l) ensures that the minimum 
eigenvalue of S n (f3) is bounded away from zero. Since there are only finitely many 
terms in g r containing (3, due to the 5-method, this can be further reduced to 



2:5 



— /3 \\ = o(r~ 1 ), which is true under C2. By expanding g n (/3) in the n^ 1 / 3 neigh- 
borhood of (3 , we obtain g n ((3) = Op^n^ 1 ^ 3 ) uniformly in \\/3 — (3 \\ < n^ 1 ^ 3 . Then 
tTTHj) follows from equation ( 1201) . 

We know that A n (/3) satisfies the constraint n~ l YHi=i 9n,i{@)/ {^+^ n (P)g n ,i(fl)} 
0. which implies 
(22) 



i=i 



A n (/3) 



By the triangle inequality and some simple algebra, the final term in f)22p is bounded 
by O p (n-^ 3 rl /2 ). Since \\S n (f3)- l \\ = O p (l), §TB) follows from ([22]). □ 

Lemma 4. Under Conditions C1-C3, 



D T rn ^ g D rn -D T m Y D, 



oil) 



Proof. Let m rn = m(j3; Y, Z) + A n W n V n . Since A n W n V n does not involve f3, 
we have 



(23) 



•-1 



which by (iii), converges to D^S D m . 

Second, following Qin and Lawless (1994), for any n, we have 



1J r n z -'n,g J -'r n ~ U opt ^n, opt 1 -' opt > 



where = A op t(f3)g rn is a g-vector and A opt (/3) is the optimal linear combination 
of g r . So it suffices to show the following difference is zero: 



(24) 



^ opt ^n, opt 1 -' opt m z -'n,m 1J 'm- 



Given ( l23l) . ( J24l) is positive definite due to optimality. Furthermore, 



" opt*-'n,opt 1J opt 1 -'m*-'n,m J - y m 

D%t^n,optDopt — D m tl D m + D m il D m — D m T: n x m D m . 
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-1 



By Zhang et al. (2008), we know that -D^^ D m is the semiparametric efficiency 
bound, which implies the first difference is non-positive definite. Since the second 
difference is o p (l), we know (pM)) is nonpositive definite. □ 



Lemma 5. Under (i), (ii) and C2, j3 n — /3, 



< n 



-1/3 



Proof. We first consider (3 on the n 1//3 sphere of /3 , i.e. (3 — f3 = un 1 ' 3 , where 
it is a unit vector. On the one hand, by the Taylor series expansion and Lemma El 

n 

2 ^ log (l + \ T n ((3)g n ,M) = 2n\ T n ((3)g n ({3) - nXl(f3)S n (f3)X n (f3) + O p (r l J 2 ). 

i=l 

By (HZD, it is equivalent to ngl(f3)S^ 1 (/3)g n (/3) + O p (r}/ 2 ). By taking the Taylor 
series expansion at (3 , it equals to 



u T D T rn ^ g D r un^ + o p {n 1 ^), 



which is bounded below by O p (n 1 ^) by LemmaHl On the other hand, 2 Yli=i 1°S ( 1 



X n ((3 )g n j(/3 ) ) = O p (r n ), which is strictly less than O p (n 1 / 3 ) by condition C2. 



Therefore, 



0„-A> 



< n 



-1/3 



□ 



Lemma 6. Under conditions C1-C3, we have the asymptotic normality of the "in- 
fluence function" 



Proof. We can reduce the problem to the unidimensional case by noting that it 
suffices to show that for any q x 1 vector t, 



(25) 



t T D T r Y-yi*g n {(3 ) -+ iV(0,i T D T m E 



First, the variance of the left hand side of ( 1251) is t D rn T, n g D rn t, which converges 
to t T D m Yi D m t by Lemma HJ 
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Second, we verify the Lindeberg condition (Billingsley 1986) 



i=i 

E{ [t T D T rn ^ g g rn {P ) 



1/2 J J 



} 



where the last step comes from 



which goes to since the numerator is asymptotically bounded. Hence Lemma [6] 
holds by the Lindeberg-Feller Central Limit Theorem. □ 

Proof of Theorem^ Let U n (f3,\) = I ELi 7tJ*£W and = * ^ T^f^f • 

We know that (/3 n , A n ) satisfies U n ((3 n , X n ) = and V n (f3 n , A n ) = 0. By taking the 
Taylor series expansion, we have 

= C/„(3 n ,A n ) 

(26) = ^J/3 ) + Z> T (/3 )(3 n -/3 )-5„(/3 )A n + O p (n- 2 / 3 ry 2 ), and 
= F n (3„,A n ) 

(27) = D T ((3 )\ n + O p (n- 2 / 3 ). 

Solving (T26D and (JZZD for 3 n - /3 , we get, 

(28) n 1 / 2 (3 n -/3 ) = -n 1 / 2 ( J D(/3 ) r 5; 1 (/3 )D( / 3 ))- 1 I)(/3 )5; 1 (/3 )^(/3 )+o p (l). 
By triangular inequality and Lemma HI we can show that 



(29) 



;z? T (/3 )5„- 1 (/3 ) J D(/3 ))^ 1 - (Dl^D^- 1 
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By Lemma El 



D (P )S~ 1 (f3 Q )n 1 ^g n (P ) = D^.Jl~y/ 2 g n {(3 Q ) + o p [ 

N^DlfT 1 D m ). 



Then Theorem 13.31 follows from ( 128]) . (129]) and Slutsky's Theorem. 



□ 



Proof of Theorem 13.51 Since there are only finitely many terms in g n and S n that 
contain (3, by the ^-method, we have 



(D T n )s- 1 n )D0 n )r 1 - (b T WS- 1 (f3 )b((3 ))- 1 = 0,(1) 



Then the result follows from (ESI). 



□ 



Proof of Theorem 13.61 Taking the Taylor series expansion, we get 

d 2 l ' 



n 



1/2 



(3 n -/3o) T ^ V2 (3n-/3o)+0 P (l) 



Then Theorem 13.31 implies T\ n — > xl as n — » oo, when ifo is true. 



□ 



Proof of Corollary 13.71 When only (3 l is specified in the null hypothesis, we write 
the likelihood ratio statistic as the sum of two differences, each of which can be 
expanded in a manner similar to that in Theorem 13.61 and we have 



T, 



2n 



2 log (l + K W9n,M) ~ 2 E l0 § I 1 + ^ (Pn)9n,i{Pn 
i=l i=l 

n n 

-[2£ log (l + K(Po)9nAP )) - 2 ^ log (l + Al(/3 10 , Mg nti w , 3 



i=i 



i=i 



1/2 



(/3io " 3m) T (^n " Au^Aai)^/ 2 ^ - An) + o P (l) 



The last equation comes from /3 2 o~ 020 = 02n~ $20+ A2i(/3 ln — /3 10 )+o p (l). Thus 
Corollary 13.71 holds because n 1//2 (/3 ln — /3 10 ) converges in distribution to N(0, (An — 
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A l2 A 22 l A^)' 1 ) under H Q . □ 

Proof of Theorem 13.81 Following the same steps as in the proof of Theorem 13.31 
we can show that 

n^A-^0 n -l3 O )^N(A-^h,I). 

Taking the Taylor series expansion of the empirical likelihood ratio test statistic at 
/3 , we have 

T ln = n^iX -f3 a + Vv / ^) T A(/3 )n 1 / 2 (3„ -f3 a + h/y/n) + o p (l), 

where the second equality comes from (3 a = f3 + h/y/n being a sequence of con- 
tiguous alternatives. Therefore, T ln — > \\ with noncentrality parameter h Ah as 
n — > oo under the alternative H a : (3 = (3 a = f3 + hj y/n. □ 

Proof of Corollary l3.9l Similar to the preceding proof, we have under the contiguous 
alternative 

T 2n = n l '\f3 10 - Anf(Aii - A 12 A 22 1 A 21 )n 1 / 2 ((3 10 - 3 ln ) + o p (l). 

Similar to Theorem I3.3[ we can show that when H a : (3 1 = (3 la = f3 10 + hi/ y/n is 
true, (An — Ai 2 A 22 l A 2 i) _1 / 2 n 1 / 2 (/3 10 — f3 ln ) converges in distribution to N((Au — 
Ai 2 A 22 A^Y 1 / 2 ^, I), which implies Corollary 13.91 □ 



In the following part of the APPENDIX, we verify that g* n in the examples follow- 
ing Corollary 13.41 satisfies Condition CI. The other conditions are satisfied trivially. 
Since the Fourier basis are naturally bounded by 1, the uniform boundedness reduces 
to the boundedness of m which is of finite dimension and usually holds easily. So (i) 
is satisfied. Let 

V n = ((li - 7ri)/7Ti,Sn,Cii, . . . ,s ldn ,c ldn , (l 2 - n 2 )/n 2 , s 21 , c 21 , . . . , s 2dn , c 2dn ) T 



2S 



and g. {(3) = (m T (/3; Y, Z), V T n ) 7 ', where 1 



Slj = y2(l i -7r i )sin(27rjF(X))/7r l , 



y/2(U - 7r i )cos(2 7 rjF(X))/7r i , z = 1,2, j 



d n . For notation simplicity, 



we omit W n in W n V n when there is no ambiguity. Then letting Id denote the dx d 
identity matrix, we have the following matrix partition 





E[mm T ) 


E(mV T n ) 






1— 7T1 T 


—Ildn+l 




E(V n m T ) 


— l2d n + l 


1 — 7T2 T 

n2 *2d n +l 



Thus, by some simple algebra and C3, we can show that the eigenvalues of S n 9 are 
bounded away from and oo. However, since F is unknown in practice, we typically 
use F n (x) = n~ l J27=i l{Xi<x} instead. Let 

■~ T 

V n( Z > X ) = ((ll - 7Ti)/7ri,Sii,Cii, . . . ,S ldn ,C ldn , (1 2 - 7T 2 ) /tT 2 , S 2 1 , Qjl , • • ■ ,S2d„,C 2 dJ 



. r 



,d n . Define e n = g r d^ - 



and = (m T ((3;Y,Z),V n (Z,X)) T , where % = >/2(l f - sin(27rjF ft (a;))/ 

= y/2(li - in) cos(2irjF n (x))/-Ki, i = 1,2, j = 1, 
9r n 9L Then 



71";. 



r n max |er^, J%fc | < 2M 2 r n \smird n (F n (X) - F(X))\ 

j,k 

= O p {rln-^). 

Following the argument in LemmaEJ when we let r n = o(n*), we know the eigenvalues 
of E(g r (f3 )g^ ((3 Q )) are also bounded away from zero and infinity. So (ii) holds. 

Moreover, let f(z,x) = J2k=o(^k — ^k)E(m((3;Y, Z)\Z = k,x) and A n be the 
Fourier coefficients in the Fourier expansion of f(z, x) with the Fourier basis specified 
in V n (z, x). We know from Fourier approximation theory that A n V n (z, x) — > f(z, x) 
uniformly. Thus, by Condition C3 and the Dominated Convergence Theorem, (iii) is 
satisfied. □ 



Proof of the validity of the plug- in estimator F n . Checking the derivation of all the 
theorems, we find that the following two conditions will guarantee the validity of the 
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theorems when F is replaced by F n 



(30) 
(31) 



n 



-1/2 



^ ^j i.9n,i 9n,i 



i=l 



n 



n 

-1 ST^ f * XT _ T \ 

/ j I yn,iiln,i 9n,i9n,i J 

i=l 



Op{l) 



where g n { and gr n j are g Tn and <) rn evaluated at the i th sample. The norm of a matrix 
M is defined to be sup ||Aftt||, where u is a unit vector. The sufficiency of the above 

u 

two conditions when the number of constraints is fixed can be seen from the existing 
literature (see, for example, Hjort et al. (2009)). 

Denote the j th component of a vector g by gK Then, for any j, we have 

2 



E 



n 



^ y (.9 n.i 9n 

i=l 

where C\ is a universal constant. Therefore 

n 

~ l ' 2 ^9n^-9n 



X\, . . . , X n 



<C l rl\\F n -F\ 



oo ' 



E 



n 



i=i 



2 










• ? x n | 



which converges to in probability due to C2. By Chebyshev's inequality, we know 
that for any e > 0, 



P 



n 



-1/2 



^ ^j (.9n,i 9n,i 



i=l 



> 6 



A 1, . . . , Xj 



Op(l), 



which implies (I3"0~|) due to the dominated convergence theorem. 

Denote e u = n' 1 Y%=i {9n,Jjl,—9 n ,i9l,i\ u - Then we have E {( £ i) 2 \ x ^ • • • > x 4 < 
Op(r^n~ 2 ) uniformly for u and j. Therefore, _E{||£ n || 2 | X\, . . . , X n } < O p (r^n~ 2 ) < 
o p (l), which implies f l3"Tj) . □ 
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Table 1. Bias and Standard Error Comparisons When Logit is Linear 
in X. 
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NOTE: In all the tables, 'marginal' means using empirical likelihood method with 2 
marginal estimating equations Y — + (5 2 Z) and z(y — (f>(j3i + f3 2 Z)^, while "5 
Fourier" has three additional estimating equations 2Z — 1, \/2(2Z — 1) sin[27rF n (X)] 
and \/2(2Z — 1) cos[27rF n (X)], where F n (X) is the empirical cumulative distribution 
function of X. MC Bias is Monte Carlo bias, OptStd is the asymptotic standard error 
obtained according to the sandwich formula, MC Std is the Monte Carlo standard 
error, CovProb is the coverage probability of 95% Wald confidence intervals and avlen 
is the average length of those confidence intervals. 



Table 2. Bias and Standard Error Comparisons When Logit is Qua- 
dratic in X. 



33 



Mpthnrl 


Tri lp (A 


MC Bias 


OptStd 


MC Std 


( .nvPrrth 


d V 1CL1 






X ~ X(0,0.5 2 ) 






marginal 


n c;oos 
U.ozyo 


0.0057 


0.2059 


0.2093 


u.yoio 


n qi fin 
U.olOU 




0.7758 


0.0094 


0.3169 


0.3266 


9536 


1.2683 


5 Fourier 


0.5298 


0.0061 


0.2059 


0.2088 


0.9480 


0.8090 




0.7758 


0.0088 


0.3169 


0.3257 


0.9524 


1.2523 






X 


~ X(0,1 2 


) 






marginal 


0.9664 


0.0106 


0.2182 


0.2307 


0.9476 


0.8845 




0.8105 


0.0182 


0.3466 


0.3795 


0.9494 


1.4450 


5 Fourier 


0.9664 


0.0111 


0.2182 


0.2254 


0.9448 


0.8604 




0.8105 


0.0156 


0.3466 


0.3648 


0.9502 


1.3800 



Table 3. Bias and Standard Error Comparisons When Logit Contains 
Two Covariates. 



Method True f3 MC Bias OptStd MC Std CovProb avlen 









Xi ~ iV(0,l 2 ),X 2 ~ 


iV(0,2 2 ) 










marginal 





.1061 


-0.0003 0.1649 


0.2005 


0. 


,9558 


0. 


,7883 




0. 


.3157 


0.0043 0.1828 


0.2933 





,9494 


1 


,1282 


7 Fourier 





.1061 


-0.0009 0.1649 


0.1761 


0. 


.9526 





.6813 







.3157 


0.0051 0.1828 


0.2311 


0. 


.9438 





.8716 








~ N 2 (0,1 2 ),X 2 ~ 


iV(0,2 2 ) 










marginal 


0. 


.4389 


0.0063 0.1688 


0.2032 





,9550 


0. 


.8069 







.5493 


0.0056 0.1985 


0.3072 


0. 


.9486 


1 


.2023 


7 Fourier 





.4389 


0.0052 0.1688 


0.1825 





.9494 


0. 


,7012 







.5493 


0.0041 0.1985 


0.2490 


0. 


.9458 


0, 


.9396 
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NOTE: The logit is either quadratic (X ~ iV 2 (-,-)) or linear (X ~ N(-, •)) in each 
covariate. 
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Table 4. Power Comparison When Logit is Linear in X. 
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Table 5 


. Power Comparison When Logit is 


Quadratic in X. 
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Table 6. Power Comparison When Logit Contains Two Covariates. 
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NOTE: In each scenario, f3i and f3 20 are the true values. The profile empirical 
likelihood ratio test is used to test the null hypothesis H : (3 2 = 0. CovProb are the 
coverage probabilities of tests /3 2 = /?2o- 



